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AN IMPLEMENTATION OF THE LOOK-AHEAD LANCZOS 
ALGORITHM FOR NON-HERMITIAN MATRICES* * * § 

ROLAND W. FREUNDfi MARTIN H. GUTKNECHTJ, and NOfeL M. NACHTIGAL§ 


Abstract. The nonsymmetric Lanezos method can be used to compute eigenvalues of large 
sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the 
original Lanezos algorithm is susceptible to possible breakdowns and potential instabilities. We 
present an implementation of a look-ahead version of the Lanezos algorithm that — except for the 
very special situation of an incurable breakdown — overcomes these problems by skipping over those 
steps in which a breakdown or near-breakdown would occur in the standard process. The proposed 
algorithm can handle look-ahead steps of any length and requires the same number of matrix- vector 
products and inner products as the standard Lanezos process without look-ahead. 

Key words. Lanezos method, orthogonal polynomials, look-ahead steps, eigenvalue problems, 
iterative methods, non-Hermitian matrices, sparse linear systems 

AMS(MOS) subject classifications. 65F15, 65F10 

1. Introduction. In 1950, Lanezos [19] proposed a method for successive reduc- 
tion of a given, in general non-Hermitian, N x N matrix A to tridiagonal form. More 
precisely, the Lanezos procedure generates a sequence H^ n \ n = 1,2,..., of n x n 
tridiagonal matrices which, in a certain sense, approximate A. Furthermore, in exact 
arithmetic and if no breakdown occurs, the Lanezos method terminates after at most 
Jj (< AT) steps with a tridiagonal matrix which represents the restriction of A or 
A^ to an A-invariant or A ^-invariant subspace of C^, respectively. In particular, all 
eigenvalues of H&) are also eigenvalues of A, and, in addition, the method produces 
basis vectors for the A-invariant or A T -invariant subspace found. 

In the Lanezos process, the matrix A itself is never modified and appears only 
in the form of matrix-vector products A • v and A T * w . Because of this feature, the 
method is especially attractive for sparse matrix computations. Indeed, in practice, 
the Lanezos process is mostly applied to large sparse matrices A, either for computing 
eigenvalues of A or — in the form of the closely related biconjugate gradient (BCG) 
algorithm [20] — for solving linear systems Ax = 6. For large A, the finite termination 
property is of no practical importance and the Lanezos method is used as a purely 
iterative procedure. Typically, the spectrum of offers good approximations to 
some of the eigenvalues of A after already relatively few iterations, t.e., for n <C N. 
Similarly, BCG — especially if used in conjunction with preconditioning — often 
converges in relatively few iterations to the solution of Ax = 6. 

Unfortunately, in the standard nonsymmetric Lanezos method a breakdown — 
more precisely, division by 0 — may occur before an invariant subspace is found. In 
finite precision arithmetic, such exact breakdowns are very unlikely; however, near- 
breakdowns may occur which lead to numerical instabilities in subsequent iterations. 
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The possibility of breakdowns has brought the nonsymmetric Lanczos process into 
discredit and has certainly prevented many people from using the algorithm on non- 
Hermitian matrices. The symmetric Lanczos process for Hermitian matrices A is a 
special case of the general procedure in which the occurrence of breakdowns can be 
excluded. 

On the other hand, it is possible to modify the Lanczos process so that it skips over 
those iterations in which an exact breakdown would occur in the standard method. 
The related modified recurrences for formally orthogonal polynomials were mentioned 
by Gragg [13, pp. 222-223] and by Draux [7]; also, in the context of the partial 
realization problem, by Rung [18, Chapter IV] and Gragg and Lindquist [14]. However, 
a complete treatment of the modified Lanczos method and its intimate connection 
with orthogonal polynomials and Pade approximation was presented only recently, by 
Gutknecht [15, 16]. Clearly, in finite-precision arithmetic, a viable modified Lanczos 
process also needs to skip over near-breakdowns. Taylor [26] and Parlett, Taylor, and 
Liu [24], with their look-ahead Lanczos algorithm, were the first to propose such a 
practical procedure. However, in [26, 24], the details of an actual implementation are 
worked out only for look-ahead steps of length 2. We will use the term look-ahead 
Lanczos method in a broader sense to denote extensions of the standard Lanczos 
process which skip over breakdowns and near-breakdowns. Finally, note that, in 
addition to [15, 16], there are several other recent papers dealing with various aspects 
of look-ahead Lanczos methods (see [1, 2, 4, 5, 8, 12, 17, 22]). 

The main purpose of this paper is to present a robust implementation of the 
look-ahead Lanczos method for general complex non- Hermitian matrices. Our inten- 
tion was to develop an algorithm which can be used as a black box. In particular, 
the code can handle look-ahead steps of any length and is not restricted to steps of 
length 2. On many modern computer architectures, the computation of inner products 
of long vectors is a bottleneck. Therefore, one of our objectives was to minimize the 
number of inner products in our implementation of the look-ahead Lanczos method. 
The proposed algorithm requires the same number of inner products as the classical 
Lanczos process, as opposed to the look-ahead algorithm described in [26, 24], which 
always requires additional inner products. In particular, our implementation differs 
from the one in [26, 24] even for look-ahead steps of length 2. 

The outline of the paper is as follows. In Section 2, we recall the standard 
nonsymmetric Lanczos method a nd its close r elationship with orthogonal polynomials. 
Using this connection, we then describe the basic idea of the look-ahead versions of 
the Lanczos process. In Section 3, we present a sketch of our implementation of the 
algorithm with look-ahead and some of its basic properties. In Section 4, we discuss in 
more detail issues related to the look-ahead feature of the algorithm, while in Section 5 
we are concerned with issues related to the implementation of the algorithm. Finally, 
in Section 6, we report a few numerical experiments with the algorithm, both for 
eigenvalue problems and linear systems, and in Section 7, we make some concluding 
remarks. 

We remark that an extended version of the present paper is available as a technical 
report [10]. In particular, details omitted here can be found therein. Furthermore, 
the look-ahead Lanczos process can be used to compute approximate solutions to 
Ax - 6, solutions which are defined by a quasi-minimal residual (QMR) property. 
The resulting QMR algorithm is described in detail in [11, 12]. 

For notation, we will adhere to the Householder conventions, with only a few 
exceptions which we will note. Throughout the paper, all vectors and matrices can be 
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assumed to be complex. As usual, M r = and M H = (/I^) denote the transpose 
and the conjugate transpose, respectively, of the matrix M = (wj)> The largest and 
smallest singular value of M is denoted by <r max (M ) and <r m j n (Af), respectively. The 
vector norm is always the Euclidean norm and \\M\\ = ( M ) denotes 

the corresponding matrix norm. The notation 

Af n (c, B) := span {c, Be, . . . , 5 n-1 c} 

is used for the nth Krylov subspace of generated by c € and the N x N matrix 
B . 

V n ~ {*(A) = To + 7 iA + ■ • - + TnA n | To,Ti»-- - »7« € C) 

denotes the set of all complex polynomials of degree at most n. Furthermore, A is 
always assumed to be a possibly complex and in general non-Hermitian N x N matrix. 

Finally, we note that in our formulation of the nonsymmetric Lanczos algorithm 
and its look-ahead variant, we use A T rather than A H . This was a deliberate choice 
in order to avoid complex conjugation of the scalars in the recurrences; the algorithms 
can be formulated equally well in either terms (cf. (2.18)). 

2. Background. In this section, we briefly recall the classical nonsymmetric 
Lanczos method [19] and its close relationship with formally orthogonal polynomials 
(FOPs hereafter). Using this connection, we then describe the basic idea of the look- 
ahead Lanczos algorithm. 

Given two nonzero starting vectors v\ £ and w\ £ the standard non- 
symmetric Lanczos method generates two sequences of vectors {v n }£ =1 and {u>n}n=i 
such that, for n = 1 

span {vi, t? 2 , . - • ,v n } = AT n (t/i,j4), 

^ span {u?i, u> 2 , . - • > = K n (wi } A T ), 


and 

(2.2) 


r Si ^ o if i = j , 

\ 0 otherwise, 


for all i, j = 1, . . . , n. 


The actual construction of the vectors v n and w n is based on the three-term recurrences 


(2.3) 


Un + l — Oc n V n — 1j 

w n +i - A T w n - a n w n - j3 n w n -x, 


where 




6 n 


and 


w^_ l Av n 
Sn- 


6 n 

&n - 1 


are chosen to enforce (2.2). For n = 1, we set 0\ = 0 and vq — wq — 0 in (2.3). Letting 


(2.4) y("> := [ui t/ 2 t7„] and := (u;i w 3 ••• u/„] 


denote the matrices whose columns are the first n of the vectors Vj and wj , respectively, 
and letting 

fa, 02 0 ••• 0] 


tf (n) := 


1 C*2 

o 


0 


0 


0n 

0 1 atn. 
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denote the tridiagonal matrix containing the recurrence coefficients, we can rewrite 
(2.3) as 

^(n)- y(n)tf(") + [0 ... Q „ n+1 ] , 

^ 2 ' 5 ^ A T W (n '> = W^ (n) i/ (n) + [0 ••• 0 U>n+1 ] • 

Moreover, the biorthogonality condition (2.2) reads as 

(2.6) [w^)) T V^ = D (n) := diag {6 l , <5 2 , . . . , S n ). 

Let L be the largest integer such that there exist vectors v n and w n , n = 1, . . . , L t 
satisfying (2.1) and (2.2). Note that L <N and that, in view of (2.3), L is the smallest 
integer such that 

(2.7) + + l = 0* 

Moreover, let 

L r = L r {v\, A) := dim/fw(vi,^4) and L\ = Li(wi,A t ) := dimi<Gv(tt>i,>l T ) 

denote the grade of V\ with respect to A and the grade of w\ with respect to A T , 
respectively (cf. [28, p. 37]). There are two essentially different cases for fulfilling 
the termination condition (2.7). The first case, referred to as regular termination, 
occurs when = 0 or = 0. If v L ^ l = 0, then L = L r and the right Lanczos 
vectors t?i, . . . , vl t span the A-invariant subspace KL r (vi, A). Similarly, if u; L+1 = 0, 
then L = L\ and the left Lanczos vectors W\ , . . . , wl, span the ^-invariant subspace 
KhiwuA 7 ). Unfortunately, it can also happen that the termination condition (2.7) 
is satisfied with v L + x ^ 0 and w L + x ^ 0. This second case is referred to as serious 
breakdown [28, p. 389]. Note that, in this case, 

L < L+ := min {L/, L r ] 

and the Lanczos vectors span neither an A-invariant nor an ^-invariant subspace of 

C N . 

It is the possibility of serious breakdowns, or, in finite precision arithmetic, of 
near-breakdowns , i.e., 

u>n+i v n+i « 0, but w n+i 0 and v n+i jt 0, 

that has brought the classical nonsymmetric Lanczos algorithm into discredit. How- 
ever, by means of a look-ahead procedure, it is possible to leap (except in the very 
special case of an incurable breakdown [26]) over those iterations in which the stan- 
dard algorithm would break down. Next, using the intimate connection between the 
Lanczos process and FOPs, we describe the basic idea of the look-ahead Lanczos 
algorithm. 

First, note that 


( 2 . 8 ) 


K n {v u A)= | * 

K n {w lt A T ) = {*(. 4 t ) u>i | € V n -i}. 
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In particular, in view of (2.3), for n = l f ...» L, 

(2.9) v n = tf n _i(.4)vi and w n = V n -i(A T )w u 

where ^ n -i € V n -\ is a uniquely defined monic polynomial. Then, introducing the 
formal inner product 

(2.10) (<$, tf) := ($(.A T )u>i)' r (^(^ri) = u;f*(i4)«(i4)i;i 

and using (2.1), (2.8), and (2.9), we can rewrite the biorthogonality condition (2.2) in 
terms of polynomials: 

(2.11) (*„_!,*) = 0 for all ^ E V n -2 
and 

(2.12) (tf n -i,tf„-i) 5*0. 

Note that, except for the Hermitian case, i.e., A = A H and w\ = TFi t the formal inner 
product (2.10) is indefinite. Therefore, in the general case, there exist polynomials 
^ ^ 0 with “length” (tf, tf) = 0 or even (¥, tf) < 0. 

A polynomial G V n -u ^ 0, that fulfills (2.11) is called a FOP (with 
respect to the formal inner product (2.10)) of degree n — 1 (see, e.g., [3], [7], [15]). 
Note that the condition (2.11) is empty for n = 1, and hence any = To ^ 0 is a 
FOP of degree 0. From (2.11), 

— l(A) = 7o + 7l ^ + H7n-lA n 1 

is a FOP of degree n — 1 if, and only if, its coefficients 70 , ■ * * »7n-i are a nontrivial 
solution of the linear system 

/io ^1/^2 ••• ^n-2 

Pi • 

Pi : 

: •’ /^2n — 5 

Pn - 2 *** ^2n-5 ^2n-4 

Here 

:= WiA*v x = (1, A J ), j = 0,1,.-. , 

are the moments associated with (2.10). A FOP is called regular if it is uniquely 

determined by (2.11) up to a scalar, and it is said to be singular otherwise. We remark 
that FOPs of degree 0 are always regular. With (2.13), one easily verifies that a regular 
FOP has exactly degree n — 1. In particular, a regular FOP is unique if it is 

required to be monic. Moreover, singular FOPs occur if, and only if, the corresponding 
linear system (2.13) has a singular coefficient matrix, but is consistent. If (2.13) is 
inconsistent, then no FOP ^n-i exists. This case is referred to as deficient. By 
relaxing (2.11) slightly, one can define so-called deficient FOPs (see [15] for details). 
Simple examples (see, e.g. f [11, Section 13]) show that the singular and deficient cases 
do indeed occur. Thus, regular FOPs need not exist for every degree n — 1. We 
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would like to stress that this phenomenon is due to the indefiniteness of (2.10). For 
a positive definite inner product (■,•), unique monic formally orthogonal polynomials 
always exist, up to the degree equal to the grade of t»i with respect to A. Such a 
definite inner product is induced in the Hermitian case A - A H and u»i = t»i. In 
this case, the FOPs are true orthogonal polynomials with respect to a positive weight 
whose support is a set of points on the real axis (see, e.g., [25]). In addition, they 
have real coefficients and therefore 

(¥, ¥) = wf ¥(A)* (A)t7, = v^W{A H )^(A)v l = ||*04>i || 2 • 

Finally, given a regular FOP 'Fn-i> it is easily checked whether a regular FOP of 
degree n exists. Indeed, using (2.13), one readily obtains the following 

Lemma 2.1. Let V n -i be a regular FOP (with respect to the formal inner product 
(2.10)) of degree n- 1. Then, a regular FOP of degree n exists if, and only if, (2.12) 
is satisfied. 

Let us return to the standard nonsymmetric Lanczos process (2.3). Using (2.7), 
(2.9), (2.10), and Lemma 2.1, we conclude that a serious breakdown occurs if, and 
only if, no regular FOP exists for some L < L*. In this case, the termination index L 
is the smallest integer L for which there exists no regular FOP of degree L. 

On the other hand, there is a maximal subset of indices 

(2.14) {ni,n 2) ... ,n/} C {1,2, ... ,L*}, := 1 < n 2 < ■ ■ ■ < nj < L*, 

such that, for each j = 1, 2, . . . , J, there exists a monic regular FOP ^ n> -i e ?V-i' 
Note that n a = 1 since 'Fo(A) = 1 is a monic regular FOP of degree 0. It is well 
known [7, 14] that three successive regular FOPs V„.-u and *„ i+1 -i are 

connected via a three-term recurrence. Consequently, setting, in analogy to (2.9), 

i; = VC^-iCAjvi and w n , = 'H n ._i(A T )w u 

we obtain two sequences of vectors {t>„ i }/=i an< ^ {“V }/=i can com puted 

by means of three-term recurrences. These vectors will be called regular vectors, 
since they correspond to regular FOPs. Note that the starting vectors v\ and w\ 
are always regular. The look-ahead Lanczos procedure is an extension of the classical 
nonsymmetric Lanczos algorithm; in exact arithmetic, it generates the vectors v n . and 
w , j = 1, . , . , J. If nj = L* in (2.14), then these vectors can be complemented to 

a basis for an ^-invariant or ^-invariant subspace of C N . An incurable breakdown 
occurs if, and only if, nj < L* in (2.14). Finally, note that 

w^v = w T v n ~0 for all v G A' n ^_ l (ui, A), w € K n ^_i(w\, A ), 

j — !,••• » J- 

The look-ahead procedure we have sketched so far only skips over exact break- 
downs. It yields what is called the nongeneric Lanczos algorithm in [15]. Of course, 
in finite precision arithmetic, the look-ahead Lanczos algorithm also needs to leap 
over near-breakdowns. Roughly speaking, a robust implementation should attempt 
to generate only the “well-defined” regular vectors. In practice, then, one aims at 
generating two sequences of vectors {v njk }f =1 and {u7 nij> }f_, where 

{ n Jii }* = i Q { n ; }/=i 


(2.15) 


h ~ 1 . 
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is a suitable subset of (2.14). We set j\ = 1, since i>i and wi are always regular. The 
problem of how to determine the set (2.15) of indices of the “well-defined” regular 
vectors will be addressed in detail in Section 4. 

In order to obtain complete bases for the subspaces K n (v\ t A ) and K n (uJi i A T ) i 
we need to add vectors 

v n £ K n (vi,A)\K„-i(vi,A) and w n £ K n (w 1 ,A T )\K n -i{w 1 ,A T ), 

( } n = n jlk _ 1 + l,...,n i4 -l l Jfc = 2,3, . . . ,K, 

to the two sequences {un> 4 }^i and respectively. Clearly, (2.16) guar- 

antees that (2.1) remains valid for the look-ahead Lanczos algorithm. The vec- 
tors in (2.16) are called inner vectors. Moreover, for each fc, the vectors v n , n = 
nj h , rij h + 1 ,... , n ; - fc+1 - 1 , and correspondingly for w n , are referred to as the k th 
block. The inner vectors of a block built because of an exact breakdown correspond to 
singular or deficient FOPs, while the inner vectors of a block built because of a near- 
breakdown correspond to polynomials which in general are combinations of regular, 
singular, and deficient FOPs. We will refer to both the regular and the inner vectors 
v„ and w n generated by the look-ahead variant as right and left Lanczos vectors, in 
analogy to the terminology of the standard nonsymmetric Lanczos algorithm. 

So far, we have not specified how to actually construct the inner vectors. The 
point is that the inner vectors can be chosen such that the v n ’s and w n y s from blocks 
corresponding to different indices k are still biorthogonal to each other. More precisely, 
with V( n ) and W defined as in (2.4), we have, in analogy to (2.6), 

(2.17) (W< b >) t V ( b) = £> (n) , n = n jt -l, l = 2,Z,...,K. 

Here, is now a nonsingular block diagonal matrix with l — 1 blocks of respective 
size (n^-rijjx (n ifc+l -n jk ), i = l,...,/-l. Similarly, (2.5) holds, for n = n u - 1, 
/ = 2, 3, . . . , K, where is now a block tridiagonal matrix with diagonal blocks of 
size (n Jfc+l - n jk ) x (n ifc+1 - J, k = 1, 1 (cf. (3.4-5)). 

There are two fundamentally different approaches for constructing inner vectors 
with the property (2.17). In both cases, inner vectors are first generated using a simple 
three-term recurrence. However, in the first approach, each inner vector in a block 
is then biorthogonalized against the previous block as soon as it is constructed. This 
variant will be called the sequential algorithm. In the second approach, all the inner 
vectors in a block are first constructed using the three-term recurrence, and then the 
entire block is biorthogonalized against the previous block and possibly, depending on 
the size of the current block, against vectors from blocks further back. This variant 
will be called the block algorithm . The sequential algorithm is more suitable for a serial 
computer, while the block algorithm is more suitable for a parallel computer. In this 
paper, we describe only the sequential algorithm and its implementation. A sketch 
of the block algorithm can be found in [10]. Details of an actual implementation and 
numerical results will be presented elsewhere. 

Finally, two more notes. First, the inner product (2.10) could have been defined 
as 

(2.18) (<*, <P) := (^(^W)" (¥04>i) = iZi7"$04)*0 4 )‘'i, 

and the algorithm can be formulated equally well in either terms. Second, in the 
rest of the paper, we will use the notation := rij k for the indices of the “well- 
defined” regular vectors. However, notice that there is no guarantee that the indices n* 
generated by the look-ahead Lanczos algorithm in finite precision arithmetic actually 
satisfy (2.15). 
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3. The sequential algorithm. In this section, we start the discussion of the se- 
quential Lanczos algorithm with look-ahead. We present a sketch of the algorithm and 
its basic properties, then discuss some aspects related to its practical implementation 
in the next two sections. 

First, we introduce some notation. As in the last section, n = 1,2,... denote the 
indices of the Lanczos vectors ti n and w n . The index k — 1,2,... is used as a counter 
for the blocks built by the look-ahead algorithm. Moreover, we always use / = /(n) to 
denote the index of the block which contains the Lanczos vectors v„ and w n . Recall 
that by nt we denote the indices of the computed regular vectors, which are always 
the first vectors in each block k. Thus, nj is the index of the last computed regular 
vector with index < n. We have n i — 1. Capital letters with subscript k denote the 
matrices containing quantities from block k. For example, 

V* :=[»„» t/»*+i ... t>ru +l _i] 

is the matrix whose columns are the Lanczos vectors from a completed block k. Capital 
letters with superscripts denote matrices containing quantities from steps 1 through 
n, same as in (2.4). With this notation, the matrix form of the sequential algorithm 
with look-ahead is similar to (2.5—6) 


AV^ = + [0 0 u„+i], 

A T W (n) = W (n) H in) + [ 0 ••• 0 u> n+ i], 

= D^ n \ 

D^diag^j, ...,*,)> h~W?V k , *=1,2,. ..,/ = /(»), 


(3.1) 

and 

(3.2) 

Here 

(3.3) 

is block diagonal, and the blocks <5 1; S 2 , . . . , $r-i are nonsingular. If n = n r+ i - 1, then 
the /th block, Si, in (3.3) is also nonsingular and it is called complete. In particular, 


In (3.1), 


(3.4) 


//<"> = 


and 

w nt+i 

can 

be computed and start a new block 

~oc i 

02 

0 

... o • 


72 

a 2 




0 



0 





•• A 


. 0 

... 

0 

7i a i- 



is an n x n block tridiagonal upper Hessenberg matrix with blocks of the form 


*♦ 

♦ . . 

... *■ 


•o ••• 

1 — 
• — < 

o 

1 




* * . 

0 

0 



, 7 k = 


: 

.0 ••• 

0 

1 ♦. 


.0 ••• 

0. 


(3.5) a* = 
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while the /Vs are in general full matrices. Note that here we violate the Householder 
conventions, by using small Greek letters to denote quantities which may be matrices. 
The justification is that in general the algorithm takes regular steps, and hence these 
quantities are usually scalars. Let h* := n*+i — n*, k = 1,2, . be the size of the 
jbth block. For jb < / = l(n) the matrices a*, /V and 7 * are of size A* x ft*, x h*, 
and hk x hk~ 1 , respectively. In general, however, the /th block need not be complete. 
Hence, the matrices aj, /?/, and 7 j corresponding to the current (/th) block are of size 
hi xht , hi - 1 x hi , and h/ x A/_i, respectively, where hi n+ 1 - nj. 

We will assume that the inner vectors in a block are generated using a three-term 
recursion of the form 

l>n + l — ■ Av n — CnVn ” * 7 n v n — lj 

(3.6) . 7 * 

u>n+i = iw„ - Cn - rj n w n _ 1 , 

where Cn and rj n are recursion coefficients and r} nk = 0, fc = 1,2, ... . One may 
choose these coefficients so that they remain the same from one block to the next 
and change only with respect to their index inside the block, n — n*, or one may 
choose these coefficients so that they change from one block to the next. For instance, 
one practical choice for the polynomials in ( 3 . 6 ) are suitably scaled and translated 
Chebyshev polynomials, so that the inner vectors are generated by the Chebyshev 
iteration [21]. In this case, the translation parameters could be adjusted using spectral 
information obtained from previous Lanczos steps. We do not necessarily advocate the 
use of fancy recursions in (3.6). From our experience, the algorithm we propose builds 
very small blocks, typically of size 2 or 3. Except for artificially constructed examples, 
the largest block we observed in test runs with “real-life” matrices was of size 4. It 
occurred for the SHERMAN5 matrix where out of 1500 steps, the algorithm built 2x2 
blocks 49 times, 3x3 blocks 7 times, and one 4x4 block (see [12, Example 2]). Hence, 
the recursion in ( 3 . 6 ) is not overly important, and in our experiments, we have used the 
recursion coefficients = 1 and, if n ^ n*, rj n = 1 . On the other hand, for the block 
version of the algorithm, where larger blocks are built, more attention needs to be paid 
to the recursion used. As indicated, details of the block algorithm will be presented 
elsewhere. Finally, one could consider orthogonalizing (in the Euclidean sense) the 
right respectively left Lanczos vectors within each block. However, for the blocks we 
have seen built, such an orthogonalization process did not lead to better numerical 
properties of the algorithm. Therefore, in view of the additional inner products which 
need to be computed, orthogonalizing within each block is not justified. 

In practice, for reasons of stability, one computes scaled versions of the right and 
left Lanczos vectors, rather than the “monic” vectors v n and tu n corresponding to 
monic FOPs. A proven choice (see [24], [26]) is to scale the Lanczos vectors to have 
unit length. We denote by v n and w n the scaled versions defined by 

:= WIKII and w n := u> n /||u> n |[, 


and more generally, we will denote by hat (") quantities containing or depending on 
the scaled vectors. For example, setting 


£ (w) := diag (IKK , IMI , . . . , ||v„||)tf < n >diag (1/ |H| , 1/ |M| 1/ INI), 


Hi n) := 


£(“) 

.Pn+ iejJ’ 


jKhijl 

IMI 


e n :=[0 ••• 0 if SR", 


pn + 1 : = 
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we can rewrite the first relation in (3.1) in terms of scaled vectors as follows: 


(3.7) 


A V (n) = V r(n+1) if( n) . 


With this note, we now present a sketch of the sequential Lanczos algorithm with 
look-ahead. 

Algorithm 3.1 (Sequential Lanczos algorithm with look-ahead). 

0) Choose t>i, € C N with ||t>ij| =J|t2>i|| = 1; 

Set Vi = ii, Wi = u»i, Si = V/[V X ; 

Set m = 1, / = 1, t>o = &0 — 0, Vo = Wo = 0, Pi = = 1/ 

For n — 1, 2, . . . : 

1 ) Decide whether to construct v„+i and w n+ i as regular or inner vectors 
and go to 2) or 3), respectively; 

2) (Regular step.) Compute 


(3.8) 


3) 


5 n +i ~ Av n - V, 6f 1 VVj r Av n - 

w n+1 = A T w n - W, 6r T V, T A T w„ - W,. 1 SrJ 1 V^ 1 A T w„ 

set n,+ 1 = n + 1, / = /+ 1, Vi = Wi = 0, and go to 4); 

(Inner step.) Compute 


(3.9) 


V n+ l = Ai) n - C nVn ~ (Vn/Pn) Vn-l ~ V/- 1 6 f-l ^1- l A V n , 
Wn +1 = A T W n ~ C,nW n — (f?n/£n) d'n-l ^1—1 A U>m 


4) Compute p „+ 1 = ||tin+i|| ™d £„+i = [|u>„ + i||; 
If Pn+i = 0 or £„+i = 0, stop; 

Otherwise , set 


Vn + l — Vn+l/ Pn + U ^n + 1 — W n + l/£n+l t 

(310) V, = [Vi v n+i ], W, = [W, w n+l ], 6, = Wf V , ; 

5) Add the nonzero elements of the nth column to H ^ 
and set (Hi n ^) n +\ t n = Pn+i- 

Note that, if t) n +i and t£ n +i are inner vectors, the size of the current incomplete 
block / is increased by 1; if they are regular vectors, then the /th block is complete 
and a new block, the (/ A l)st, is started, with v n +i and w n + 1 as its first vectors. 
Finally, we remark that, in view of (3.7), the nonzero elements of the nth column of 
occur as coefficients in the first recursion of (3.8) respectively (3.9). 

4. Building blocks. In this section, we discuss the criteria used to decide in 
step 1) of Algorithm 3.1 whether a pair of Lanczos vectors v n +x and w n +i is built 
as inner vectors or as regular vectors. We propose three criteria, namely (4.3), (4.4), 
and (4.5) below. If all three checks (4.3-5) are satisfied, then v n +\ and iv n +i are 
constructed as regular vectors, otherwise, they are constructed as inner vectors. Let 
us motivate these three criteria. 

First, recall (cf. (3.3)) that for v n +i and tu n +i to be built as regular vectors it is 
necessary that Si is nonsingular. Therefore, it is tempting to base the decision Regular 
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versus inner step” solely on checking whether Si is close to singular, and to perform a 
regular step if, and only if, 

(4.1) ^min (*/<»)) > tol, 

for some suitably chosen tolerance tol. For example, Parlett [22] suggests tol = 
or tol = e 1 / 3 , where e denotes the roundoff unit. Then (4.1) would guarantee that 
complete blocks of computed Lanczos vectors satisfy 

*min (h) > tol, 4 = 1,2. 

This, together with (3.3), would imply by [22, Theorem 10.1] that 

(4.2) <Tmin (V r(n) ) > and <r m in (t^ (n) ) > - 7 =, n = n t - 1 , Ar = 1 , 2 , ... . 

Y n Y n 

Since the columns of and are unit vectors, 0 - m in (V^) and <r mm (W^")) are 
a measure of the linear independence of these vectors; in particular, (4.2) would ensure 
that the Lanczos vectors remain linearly independent. However, in the outlined algo- 
rithm, the block orthogonality ( 3 . 2 - 3 ) is enforced only among two or three successive 
blocks, and in finite precision arithmetic, biorthogonality of blocks whose indices are 
far apart is typically lost. The theorem assumes that (3.2-3) hold for all indices, and 
without this, the theorem fails in finite arithmetic. We illustrate this with a simple 
example. 

Example 4-1- In Figure 4.1, we plot <r min (i/( n j) (dots), mini<jb<(( n ) (^min (5*)) 
(solid line), and y/n (dotted line), as functions of the iteration index 

n = 1,2,..., for a random 50 x 50 dense matrix. The theorem predicts that 

y/n <r min (VW) > min (<r min ($t))> 

1 <jfc< <(n) 

which is clearly not the case. 

As this simple example shows, the check (4.1) alone does not ensure that the 
computed Lanczos vectors are sufficiently linearly independent. In particular, if the 
look-ahead strategy is based only on criterion (4.1), the algorithm may produce within 
a block Lanczos vectors which are almost linearly dependent. When this happens, the 
check ( 4 . 1 ) usually fails in all subsequent iterations and thus the algorithm never 
completes the current block, ?.e., it has generated an artificial incurable breakdown. 

In addition, numerical experience indicates another problem with (4.1): for val- 
ues of tol which are “reasonably” larger than machine epsilon, the behavior of the 
algorithm is very sensitive with respect to the actual value of toL We also illustrate 
this with an example. 

Example 4.2. We applied the Lanczos algorithm to a nonsymmetric matrix A 
obtained from discretizing a 3-D partial differential equation (cf. Example 6.4 in 
Section 6 ). This example was run on a machine with e « 1.3E— 29. In the first case, 
we set tol = e l l 4 as 6.0E-08, while in the second case, we set tol = e 1 ^ 3 as 2.3E— 10. 
In Figure 4.2, we plot <r m in (<5/(n)) versus the iteration index n for the two runs, the 
dotted line for c l/A and the solid line for e 1/3 . In the first case, the algorithm starts 
building a block which it never closes, and the singular values clearly become smaller 
and smaller. Yet if tol is only slightly smaller, as in the second case, the algorithm 
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runs to completion, in this case solving the linear system to the desired accuracy, and 
thus indicating that the block built in the first case was not a true, but an artificial 
incurable breakdown. 

We note that the sensitivity of look-ahead procedures to the choice of tolerances, 
such as tol in (4.1), was also observed in [5]. However, no remedy for this phenomenon 
is given in [5]. Furthermore, we remark that the problem of generating almost linearly 
dependent vectors is not specific to the Lanczos biorthogonalization process. Indeed, 
similar effects can also occur in true orthogonalization methods (cf. [27]). 

Examples 4.1 and 4.2 clearly show that the decision “regular versus inner step” 
cannot be based on (4.1) alone. Instead, we propose to relax the check (4.1), so that 
it merely ensures that $i( n ) is numerically nonsingular, and to add the checks (4.4-5) 
below which guarantee that the computed Lanczos vectors remain sufficiently linearly 
independent. Hence, instead of (4.1), we check for 

(4.3) ^min (^f(n)) ^ 
where € denotes the roundoff unit. 

Our numerical experiments have shown that typically the algorithm starts to gen- 
erate Lanczos vectors which are almost linearly dependent, once a regular vector u n +i 
was computed whose component Av n E K n +\(vi , A) is dominated by its component 
in the previous Krylov space K n (v \ } A) (and similarly for t£> n +i)- In order to avoid 
the construction of such regular vectors, we check the /i-norm of the coefficients for 
Vi_i and Vi in (3.8); t>„ +i can be computed as a regular vector only if 

(4.4) £ |(fcWiilMi|Sn(il) and E < "(A). 

;=n I-i j =n i 

Here n(^4) is a factor depending on the norm of A ; we will indicate later how this 
factor is computed. Similarly, we check the /i-norm of the coefficients for W\-\ and 
W\ in (3.8); u> n +i can be computed as a regular vector only if 

(4.5) ^2 < n(«4) and ^ |(fif T V r | T d T tI»„); | < n(A). 

; = n i _ i j— n i 

The pair t) n +i and ti) n +i is built as regular vectors only if the checks (4.3-5) hold true. 

We need to indicate how n(A) is chosen in (4.4-5). Numerical experience with 
matrices whose norm is known indicates that setting n(d) = ||A|| is too strict and can 
result in artificial incurable breakdowns. A better setting seems to be n(A) — 10 ||A||, 
but even this is dependent on the matrix. In any case, in practice one does not know 
||yi[|, and there is also the issue of a maximal block size, determined by limits on 
available storage. To solve the problems of estimating the norms and a suitable factor 
n(A), as well as cope with limited storage and yet allow the algorithm to proceed as 
far as possible, we propose the following procedure. Suppose we are given an initial 
value for n(/t), based either on an estimate from the user (for example, n(A) from a 
previous run with the matrix A), or by setting 



n(v4) = max {H^iH, [j } • 
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Note that here A denotes the matrix actually used in generating the Lanczos vectors, 
thus including the case when we are solving a preconditioned linear system. We then 
update n(A) dynamically, as follows. In each block, whenever an inner vector is built 
because one of the checks (4.4-5) is not satisfied, the algorithm keeps track of the size 
of the terms that have caused one or more of (4.4-5) to be false. If the block closes 
naturally, then this information is not needed. If, however, the algorithm is about to 
run out of storage, then n(A) is replaced with the smallest value which has caused an 
inner vector to be built. The updated value of n(A) is guaranteed to pass the checks 
(4.4-5) at least once, and hence the block is guaranteed to close. This also frees up 
the storage that was used by the previous block, thus ensuring that the algorithm can 
proceed. 

5. Implementation details. We now turn to a few implementation details. 
In particular, we wish to show how one can implement the sequential algorithm with 
the same number of inner products per step as the classical Lanczos algorithm. For a 
regular step, one needs to compute Si, W? Av„, and W^. l Av„ in (3.8). For an inner 
step, one needs to compute W£_ 1 Av n in (3.9) and to update Si in (3.10). We will show 
that for a block of size h t , only 2h, inner products are required: 2/ij-l will be required 
to compute Si, and one inner product will be required to compute W / Av n . We will 
obtain W^-dtin without performing any inner products. To simplify the derivations, 
we will use the “monic” vectors v n and iu„. All quantities involving the scaled vectors 
v n and w n can be obtained from the corresponding quantities involving v n and w n 
simply by scaling. Finally, we remark that, using a similar argument as in (5.1) below, 
one easily verifies that 


wj Av n = V l T A T w n and Wf^Av,, = V,Z l A T w n . 

Therefore, the coefficients 6f T V l T A T w n and SfJ' l V,Z l A T w n , which occur in the recur- 
sions for the left Lanczos vectors in (3.8) or (3.9), can be generated from S t l Wj r Av„ 
and S~J l W^. l Av n , without computing any additional inner products. 

Consider first <5;. Using (2.9) and the fact that polynomials in A commute, we 
deduce that 

(5.1) wjvj = wf9i(A)9j(A)vi = u>f¥j(;4)¥,-(A)t;i = tnju,-. 

This shows that the matrix S\ is symmetric, and hence we only need to compute its 
upper triangle. 

We will now show that once the diagonal and first superdiagonal of Si have been 
computed by inner products, the remaining upper triangle can be computed by recur- 
rences. Let W{ and Vj be two vectors from the current block. Using (3.6) and the fact 
that the inner vectors from block / are orthogonal to the vectors from the previous 
block, we have 

wj Vj = wJ(Avj - 1 -Cj-iVj-i - ^i — i v i— 2) 

= (A t u><) T,, ;-i -(j-iwfvj-i - T)j-iwjvj-i 

- (u/< + i + C<u>i + ~ 0-»«f w i-i “ T li-i w T v i-' 2 

= wf +l Vj-i + Ci w T v j - 1 + W J- 1 - 1 — <J - 1 W 7 v i - 1 — - 1 W i 


14 ROLAND W. FREUND, MARTIN H. GUTKNECHT, AND NOEL M. NACHTIGAL 

Thus, wj Vj depends only on elements of Si from the previous two columns, and hence, 
with the exception of the diagonal and the first superdiagonal, can be computed with- 
out any additional inner products. Note that the recurrences and the orthogonality 
used in the above derivation are enforced numerically, and so computing wj Vj by the 
above recurrence should give the same results - up to roundoff - as computing the 
inner product directly. 4 

We will now show how to compute Wf Av n with only one additional inner product, 
while WjH l Av n can be obtained with no additional inner products. Consider wf Av n , 
for u;,- a vector from either the current or the previous block. 

wf Av n = (A T Wi) T v n = (tu,*+i + CiWi + tftWi- i) T v n 
= wf^Vn +dwjv n + TJiwf^ x V n . 


For t < nj — 1, W^ t v n = 0, and hence wf Av n = 0. For i = n\ — 1, the above 
reduces to w* X A v n — w^v ni which is computed as part of the first row of <5/. For 
m < i < 1 , all of the terms needed are available from Si, Finally, for the last vector 

in the current block, i = nr+i - 1, we do not have tyj l+i v„, and hence have to compute 
it directly, thus requiring another inner product. 


6. Numerical examples. We have performed extensive numerical experiments 
with our implementation of the look-ahead Lanczos algorithm, both for eigenvalue 
problems and for the solution of linear systems. In this section, we present a few 
typical results of these experiments. Further numerical results are reported in [11] 
and [12]. 

Approximations to the eigenvalues of A can be obtained from the look-ahead 
Lanczos algorithm by computing some or all of the eigenvalues of the Lanczos matrix 
#( n \ the so-called Ritz values. In general, spurious approximate eigenvalues can oc- 
cur among the Ritz values. We have used the heuristic due to Cullum and Willoughby 
[6] to identify and eliminate spurious Ritz values. Although this procedure was origi- 
nally proposed for the scalar tridiagonal matrices generated by the standard Lanczos 
process, we also found it to work satisfactorily for the block tridiagonal matrices 
produced by the look-ahead Lanczos algorithm. The eigenvalues of were always 
computed using standard EISPACK routines. 

For the solution of nonsingular linear systems 


(6.1) Ax = b , 

we combine the look-ahead Lanczos algorithm with the QMR approach. More pre- 
cisely, let x 0 E be any initial guess for (6.1) and choose the normalized starting 
residual vector 

tii := r 0 /po, r 0 := 6 — Ax 0 , po := ||r 0 || , 

as the first right Lanczos vector in Algorithm 3.1. The QMR method then generates 
approximate solutions to (6.1) defined by 

(6.2) x n = x 0 + V^z n , n = 1,2,..., 
where z n is the solution of the least squares problem 


(6.3) 


min po e i — Hi n) z\\ , 


e,:=[l 0 - - 0] T € R (n+1) . 


Hi S--T1 
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We remark that, using (3.7), one easily verifies that the residual vector corresponding 
to the iterate (6.2) satisfies 

(6.4) r n := 6 - Ax n = V ,(n+1) (pod - f^ n) *n) • 


Thus the choice (6.3) of z n just guarantees that the Euclidean norm of the coefficient 
vector in the representation (6.4) is minimal. For details and further properties of the 
QMR method, we refer to [12]. 

Example 6.1. This examples is an eigenvalue problems, taken from [6]. Consider 
the differential operator 


(6.5) 


+ 20(r + y)|U 20^ ((* + y)u) + y- 


+ x + y 


on the unit square (0, 1) x (0, 1). We discretize (6.5) using centered differences on 
a 29 x 29 grid with mesh size h = 1/30. This leads to a nonsymmetric matrix of 
order N = 900. In Figure 6.1, we plot the Ritz values (marked by “o”) generated 
by the look-ahead Lanczos process after n = 40, 80, 160, 320 steps. We note that 
after 40 steps, the complex conjugate pair of Ritz values with maximal real part had 
converged to eigenvalues of A. After 80 steps, 12 Ritz values (all on the right edge of 
the spectrum) had converged, while after 160 steps the 30 Ritz values (24 on the right 
edge and 6 on the left edge of the spectrum) had converged to eigenvalues of A. For 
this example, we have used unit vectors with random coefficients as starting vectors 
vi, wi for Algorithm 3.1. 

Example 6.2. This example is an eigenvalue problem, taken from [23], whose exact 
eigenvalues are known. Generally, problems of this type arise in modeling concentra- 
tion waves in reaction and transport interaction of chemical solutions in a tubular 
reactor. The particular test problem used here corresponds to the so-called Brussela- 
tor wave model. This example was run for a matrix A of size N = 100. The look-ahead 
Lanczos algorithm needs n = 112 steps to obtain all the eigenvalues of A\ it builds 
2 blocks of size 2. For this example, we have also run the standard Lanczos process 
without look-ahead, and computed the Ritz values after n = 100, 112, 120 steps. The 
denominators w^w n were checked to exceed \ft in magnitude. In all three cases, some 
of the Ritz values obtained from the standard Lanczos process after deleting spurious 
and ghost eigenvalues do not correspond to any of the eigenvalues of A. In particular, 
the standard Lanczos process does not obtain the smallest eigenvalues of A even after 
120 steps, and generates incorrect Ritz values, as shown in the plot. In Figure 6.2, 
we plot the Ritz values generated by the look-ahead Lanczos process (marked by “o”) 
and the Ritz values generated by the standard Lanczos process (marked by “+”), both 
after 120 steps. 

Example 6.3. Here we consider a 6-cyclic matrix 

0 0 0 0 Bi ' 

h 0 0 0 0 

B$ I3 0 0 0 

0 B a h 0 O’ 

0 0 Bi 1$ 0 

0 0 0 Be I 6 . 


( 6 . 6 ) 


A = 


h 

B 2 

0 

0 

0 

L 0 
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where the diagonal blocks /i , I 2 , I 3 , I 4 , and I& are identity matrices of size 
827, 844, 827, 838, 831, and 838, respectively, so that A is a matrix of order 
N = 5005. This matrix arises in Markov chain modeling. For general p-cyclic ma- 
trices A of the form ( 6 . 6 ), Freund, Golub, and Hochbruck [9] have shown that work 
and storage of the look-ahead Lanczos process can be reduced to approximately 1/p, 
as compared to arbitrary starting vectors, if tq and u>\ have only one nonzero block 
conforming to the block structure of A. Here, we have chosen 

*«[?]■ 



where fi , gi G R 827 have random entries. The look-ahead Lanczos algorithm generates 
blocks that alternately have sizes 1 and 5 , starting with a block of size 1. In Figure 6.3, 
we plot the Ritz values (marked by “o”) generated by the look-ahead Lanczos process 
after n = 40, 80, 160, 320 steps. The standard Lanczos algorithm without look-ahead 
generates one Ritz value 1 in the first step, and then breaks down in the second step. 
Clearly, this example shows that the use of look-ahead is crucial if one wants to exploit 
the special structure of p-cyclic matrices. 

Example 6. 4- Here we consider the partial differential equation 


(6.7) 

where 


Ln = — 


Lu = f on (0, 1) x (0,1) x (0,1), 
d_ 

dx 


U Sx) SyY Sy) d, V s,j 

+ «. + ,+ + I + I + # + z J “■ 

with Dirichlet boundary conditions u = 0, The right-hand side / is chosen such that 
U = (l- *)(1 - y)(l - Z) (1 - e- x ) (1 - e-y) (l - e~') 


is the exact solution of (6.7). We set the parameters in (6.7) to /? — 30 and y — —250, 
and then we discretize (6.7) using centered differences on a uniform 15 x 15 x 15 grid 
with mesh size h = 1/16. This leads to a linear system (6.1) with coefficient matrix 
A of order N = 3375 and 22275 nonzero elements. For the first left Lanczos vector, 
we have chosen w\ = vi in Algorithm 3.1. The QMR approach takes n = 149 steps 
to reduce the norm of the initial residual by a factor of 10" 6 ; see Figure 6.4, where 
the relative norm ||r*|| / ||r 0 || is plotted versus n (solid line). Finally, we note that the 
matrix A is just the one used in Example 4.2. Recall that the look-ahead Lanczos 
algorithm based on the check (4.1) with tolerance tol = ^ 6.0E— 08 encountered 

an artificial incurable breakdown. We also ran QMR based on this version of the 
look-ahead Lanczos algorithm, and the resulting convergence curve is shown as the 
dotted line in Figure 6.4. Notice that, due to the artificial incurable breakdown, QMR 
does not converge in this case (cf. Figure 4.2). ; ^ 

7. Conclusion. We have proposed an implementation of the look-ahead Lanczos 
algorithm for non-Hermitian matrices. Our implementation can handle look-ahead 
steps of any length. Also, the proposed algorithm requires the same number of inner 
products as the standard Lanczos process without look-ahead. It was our intention to 
develop a robust algorithm which can be used in a black box. 

FORTRAN 77 codes of our implementation of the look-ahead Lanczos algorithm 
and the QMR method are available electronically from the authors (na.freund@na- 
net.ornl.gov or na.nachtigal@na-net.ornl.gov). 
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Figure 4.1. <r m in (<?t( n )) (dots), nun 1 <j k< i( n ) (<r m in ($*)) (solid line), and 
y/n cT ra j n (*(")) (dotted line), plotted versus the iteration index n 
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Figure 6.2. Ritz values (marked by u o” respectively "+") for Example 6.2, obtained 
from the look-ahead respectively standard Lanczos algorithm 
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Figure 6.3. Ritz values for Example 6.3, obtained after n = 40, 80, 160, 320 steps 
of the look-ahead Lanczos algorithm 
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Figure 6.4. Relative residual norm ||r„||/||ro|| plotted 
versus n t for Example 6.4 









